Support a custom comparison operator in DeviceReduce::ArgMin|Max#8285
Support a custom comparison operator in DeviceReduce::ArgMin|Max#8285bernhardmgruber merged 12 commits intoNVIDIA:mainfrom
DeviceReduce::ArgMin|Max#8285Conversation
DeviceReduce::ArgMinDeviceReduce::ArgMin
| // TODO(bgruber): this constraint is not accurate, since the implementation will compare the value types of | ||
| // ExtremumOutIteratorT, which is wrong IMO | ||
| ::cuda::std::enable_if_t<::cuda::std::indirectly_comparable<InputIteratorT, InputIteratorT, CompareOpT>, int> = 0> |
There was a problem hiding this comment.
Instead of InputIteratorT we should use non_void_value_t<ExtremumOutIteratorT, it_value_t<InputIteratorT>>, but that just "feels" wrong here. But this is what the implementation does. What do the reviewers think?
I think the implementation should actually be changed to compare the input values, not the converted ones.
There was a problem hiding this comment.
I do not follow why that constraint is wrong? We want to ensure that the input sequence is comparable with the passed operator. Why should we compare the ExtremumOutputIteratorT
There was a problem hiding this comment.
The reduction implementation does not call compare_op(d_in[i], d_in[j]), it calls something like:
using input_value_t = it_value_t<InputIteratorT>;
using accum_t = non_void_value_t<ExtremumOutIteratorT, input_value_t>;
accum_t a = d_in[i];
accum_t b = d_in[j];
compare_op(a, b);So it performs a conversion of the input value to the output iterator's value_type before comparing. That can be a totally different type.
I think this is a bug itself, but outside the scope of this PR.
This comment has been minimized.
This comment has been minimized.
NaderAlAwar
left a comment
There was a problem hiding this comment.
Suggestion: the issue being closed mentions ArgMax as well in the title, but this PR only appears to add public custom-comparator overloads and
test coverage for ArgMin. The internal refactor is more general, but DeviceReduce::ArgMax still seems to expose only the old no-comparator API. I would either create a separate issue for ArgMax or expose the custom comparator overload as well.
6814c46 to
77a24f4
Compare
So I temporarily added a new overload for Also, while Finally, I considered naming the new overload not
As my last paragraph points out, the new overload actually generalizes over both, |
This comment has been minimized.
This comment has been minimized.
|
@bernhardmgruber those are good points, I hadn't considered that. Looking into this some more, since the standard library and Thrust already expose comparator overloads for both My worry about
I do agree that this is a little confusing. I don't feel too strongly about this either way since I have not used this extensively but my intuition would be to stay consistent with existing standards unless we believe they are broken in some way. |
This comment has been minimized.
This comment has been minimized.
| // Initial value for empty problems, according to documented contract | ||
| const auto empty_problem_extremum = static_cast<output_extremum_t>([] { | ||
| if constexpr (::cuda::std::is_same_v<ReductionOpT, arg_min>) | ||
| { | ||
| return ::cuda::std::numeric_limits<input_value_t>::max(); | ||
| } | ||
| else if constexpr (::cuda::std::is_same_v<ReductionOpT, arg_max>) | ||
| { | ||
| return ::cuda::std::numeric_limits<input_value_t>::lowest(); | ||
| } | ||
| else | ||
| { | ||
| return input_value_t{}; | ||
| } | ||
| }()); | ||
| auto initial_value = empty_problem_init_t<per_partition_accum_t>{{PerPartitionOffsetT{1}, empty_problem_extremum}}; |
There was a problem hiding this comment.
I am really unhappy that we actually need an initial value
There was a problem hiding this comment.
It's only needed for the case where the user passes num_items == 0.
There was a problem hiding this comment.
I would love for us to change the implementation so that in the legacy API without a comparison operator we do the return value thing and for the new API we only return indices, which in that case can just be 0
| // TODO(bgruber): this constraint is not accurate, since the implementation will compare the value types of | ||
| // ExtremumOutIteratorT, which is wrong IMO | ||
| ::cuda::std::enable_if_t<::cuda::std::indirectly_comparable<InputIteratorT, InputIteratorT, CompareOpT>, int> = 0> |
There was a problem hiding this comment.
I do not follow why that constraint is wrong? We want to ensure that the input sequence is comparable with the passed operator. Why should we compare the ExtremumOutputIteratorT
| cudaStream_t stream = 0) | ||
| { | ||
| return ArgMax( | ||
| d_temp_storage, temp_storage_bytes, d_in, d_max_out, d_index_out, num_items, ::cuda::std::less{}, stream); |
| // Less-than comparator for an index/value pair that compares values first, and indices when the values are equal | ||
| template <typename ValueLessThen = ::cuda::std::less<>> | ||
| struct arg_less : ValueLessThen | ||
| { | ||
| /// Boolean max operator, preferring the item having the smaller offset in | ||
| /// case of ties | ||
| template <typename T, typename OffsetT> | ||
| _CCCL_HOST_DEVICE _CCCL_FORCEINLINE ::cuda::std::pair<OffsetT, T> | ||
| operator()(const ::cuda::std::pair<OffsetT, T>& a, const ::cuda::std::pair<OffsetT, T>& b) const | ||
| { | ||
| if ((b.second > a.second) || ((a.second == b.second) && (b.first < a.first))) | ||
| const auto& less = static_cast<const ValueLessThen&>(*this); | ||
| if (less(b.second, a.second) || (!less(a.second, b.second) && b.first < a.first)) | ||
| { | ||
| return b; | ||
| } | ||
|
|
||
| return a; | ||
| } |
There was a problem hiding this comment.
Important: Inheritance is almost always worse than making it a member.
There was a problem hiding this comment.
But with a member, we are not getting EBCO. But maybe that's not so important here.
There was a problem hiding this comment.
Hmm, I think if I move to a data member, aggregate init would no longer work with the deduction guide in C++17. This can be worked around of course. Do you insist on this change, or can I save myself 43s of typing?
|
|
||
| //! @brief Binary functor swapping the arguments to ``operator()`` before forwarding to an inner functor | ||
| template <typename Predicate> | ||
| struct swap_args : Predicate |
There was a problem hiding this comment.
Question: Why arent we just using not_fn
There was a problem hiding this comment.
Because non_fun(less{}) is not the same as greater{}, it's greater_equal{}. It should actually not matter, since we are returning the first element that matches the predicate. But I felt swapping arguments is more true.
DeviceReduce::ArgMinDeviceReduce::ArgMin|Max
Seems a bit excessive: https://github.com/NVIDIA/cccl/actions/runs/24074951429/job/70228571613?pr=8285 |
This comment has been minimized.
This comment has been minimized.
8b840ad to
ee9c2a4
Compare
Solved |
🥳 CI Workflow Results🟩 Finished in 1h 42m: Pass: 100%/269 | Total: 9d 10h | Max: 1h 37m | Hits: 62%/176731See results here. |
Fixes: #6123